NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Onyx: A 12nm 756 GOPS/W Coarse-Grained Reconfigurable Array for Accelerating Dense and Sparse Applications

https://doi.org/10.1109/VLSITechnologyandCir46783.2024.10631383

Koul, Kalhan; Strange, Maxwell; Melchert, Jackson; Carsello, Alex; Mei, Yuchen; Hsu, Olivia; Kong, Taeyoung; Chen, Po-Han; Ke, Huifeng; Zhang, Keyi; et al (June 2024, IEEE)

Full Text Available
APEX: A Framework for Automated Processing Element Design Space Exploration using Frequent Subgraph Analysis

https://doi.org/10.1145/3582016.3582070

Melchert, Jackson; Feng, Kathleen; Donovick, Caleb; Daly, Ross; Sharma, Ritvik; Barrett, Clark; Horowitz, Mark A.; Hanrahan, Pat; Raina, Priyanka (March 2023, ACM)

The architecture of a coarse-grained reconfigurable array (CGRA) processing element (PE) has a significant effect on the performance and energy-efficiency of an application running on the CGRA. This paper presents APEX, an automated approach for generating specialized PE architectures for an application or an application domain. APEX first analyzes application domain benchmarks using frequent subgraph mining to extract commonly occurring computational subgraphs. APEX then generates specialized PEs by merging subgraphs using a datapath graph merging algorithm. The merged datapath graphs are translated into a PE specification from which we automatically generate the PE hardware description in Verilog along with a compiler that maps applications to the PE. The PE hardware and compiler are inserted into a flexible CGRA generation and compilation toolchain that allows for agile evaluation of CGRAs. We evaluate APEX for two domains, machine learning and image processing. For image processing applications, our automatically generated CGRAs with specialized PEs achieve from 5% to 30% less area and from 22% to 46% less energy compared to a general-purpose CGRA. For machine learning applications, our automatically generated CGRAs consume 16% to 59% less energy and 22% to 39% less area than a general-purpose CGRA. This work paves the way for creation of application domain-driven design-space exploration frameworks that automatically generate efficient programmable accelerators, with a much lower design effort for both hardware and compiler generation.
more » « less
The Sparse Abstract Machine

https://doi.org/10.1145/3582016.3582051

Hsu, Olivia; Strange, Maxwell; Sharma, Ritvik; Won, Jaeyeon; Olukotun, Kunle; Emer, Joel S.; Horowitz, Mark A.; Kjølstad, Fredrik (March 2023, Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems)

We propose the Sparse Abstract Machine (SAM), an abstract machine model for targeting sparse tensor algebra to reconfigurable and fixed-function spatial dataflow accelerators. SAM defines a streaming dataflow abstraction with sparse primitives that encompass a large space of scheduled tensor algebra expressions. SAM dataflow graphs naturally separate tensor formats from algorithms and are expressive enough to incorporate arbitrary iteration orderings and many hardware-specific op timizations. We also present Custard, a compiler from a high-level language to SAM that demonstrates SAM's usefulness as an intermediate representation. We automatica lly bind from SAM to a streaming dataflow simulator. We evaluate the generality and extensibility of SAM, explore the performance space of sparse tensor algebra optim izations using SAM, and show SAM's ability to represent dataflow hardware.
more » « less
Full Text Available

Search for: All records